COVID-19 in nursing homes: Geographic diffusion and regional risk factors from January 1 to July 26, 2020 of the pandemic

Background COVID-19 deaths in nursing homes accounted for 30.2% of all COVID-19 deaths in the United States during the early weeks (1-January to 26-July, 2020) of the pandemic. This study presents the geographic diffusion of COVID-19 cases and deaths in nursing homes during this time period, while also providing explanation of regional risk factors. Methods and findings Nursing home COVID-19 data on confirmed cases (n = 173,452) and deaths (n = 46,173) were obtained from the Centers for Medicare and Medicaid Services. Weekly COVID-19 case counts were spatially smoothed to identify nursing homes in areas of high COVID-19 infection. Bivariate spatial autocorrelation was used to visualize High vs. Low-case counts and related deaths. Zero-inflated negative binomial models were estimated within Health and Human Service (HHS) Regions at three-week intervals to evaluate facility and area-level risk factors. The first reported nursing home resident to die of COVID-19 was in the state of Washington on 28-February, 2020. By 24-May, 2020 there were simultaneous epicenters in the Northeast (HHS Regions 1 and 2) and Midwest (HHS Region 5) with diffusion into the South (HHS Regions 4 and 6) from 15-June to 5-July, 2020. The case-fatality rate was highest from 25-May to 14-June, 2020 (30.9 deaths per 1000 residents); thereafter declining to 24.1 (15-June to 5-July, 2020) and 19.4 (6-July to 26-July, 2020) (overall case-fatality rate 1-January to 26-July = 26.6). Statistically significant risk factors for COVID-19 deaths were admission of patients with COVID-19 into nursing homes, staff confirmed infections and nursing shortages. COVID-19 deaths were likely to occur in nursing homes in high minority and non-English speaking neighborhoods and neighborhoods with a high proportion of households with disabilities. Conclusions Enhanced communication between HHS regional administrators about “lessons learned” could provide receiving state health departments with timely information to inform clinical practice to prevent premature death in nursing homes in future pandemics.


Introduction
The first case of COVID-19 isolated in a nursing home occurred on 28-February, 2020.The World Health Organization (WHO) declared COVID-19 a global pandemic on 11-March, 2020.On 13-March, 2020, the Centers for Medicare & Medicaid Services (CMS) restricted visitation of all visitors and non-essential health care personnel from entering nursing homes under the "Guidance for Limiting the Transmission of COVID-19 for Nursing Homes" [1,2].Protocols for the use of personal protective equipment (PPE) for health care workers and surveyors and facility guidelines were established including: SARS-CoV-2 testing guidance; when nursing homes should consider transferring a resident with suspected or confirmed COVID-19 infection to a hospital; and when a nursing home should accept a resident diagnosed with COVID-19 from a hospital [3].National surveillance from mandatory reporting to the CMS of COVID-19 suspected and confirmed cases and deaths showed that from 1-January to 26-July, 2020 there were 173,452 confirmed COVID-19 cases reported from nursing homes, including 46,173 COVID-19 related deaths (case-fatality rate = 26.6 per 100 nursing home residents).By 26-July, 2020 nursing home deaths accounted for 30.2% of total COVID-19 deaths in the United States [1,4].
The earlier studies on the impact of the pandemic on nursing home residents used pre-surveillance sources of COVID-19 case and death data and the CMS-COVID-19 Nursing Home Data for the United States [5][6][7][8][9][10], state-specific case studies for Connecticut [11], California [12,13], a 23-state study using data from health departments on 8,943 nursing homes [14], county and facility-level studies [15] and a facility-specific case study in Washington [16].Across these studies, there were several common facility-level risk factors for COVID-19 transmission, included larger facility size [5,8,12], nursing staff shortages [7-9, 11, 12, 14], staff who worked while symptomatic [16], or staff who worked in more than one facility [16], limited testing availability [16], for-profit vs. non-profit ownership [5,9,10,13,14], a higher percentage of Medicaid residents [5,11,14] and a higher number of total facility deficiencies or penalties [8,12,14].In terms of patient demographics, Li et al. [11] found that nursing homes in Connecticut with a high proportion of minority residents had 15.0% higher confirmed COVID-19 cases than comparable facilities.He et al. [13] also found that nursing homes in California with a higher proportion of non-white residents had a higher odds of COVID-19 cases and deaths.None of these studies specifically investigated the transfer of hospitalized patients with COVID-19 to nursing homes for additional care, and the potential of amplifying spread in nursing homes.
The social context of nursing homes was also an important risk factor for COVID-19 transmission.Two studies, Abrams et al. [5] and Travers et al. [15] found that nursing homes in areas with high vs. low shares of Black residents had a higher odds of COVID-19 presence; however, White et al. [6] did not find significant racial differences in county-level COVID-19 risk.Abrams et al. [5] found that nursing homes in urban vs. rural areas had a higher odds of COVID-19 presence and as COVID-19 increased in the community, the likelihood of a nursing home outbreak also increased.Chatterjee et al. [14] and Hege et al. [9] also found an increase in COVID-19 reporting in nursing homes in counties with higher COVID-19 rates.Traverse et al. [15] found county-level resources, rurality and counties with a high proportion of Black residents in part explained high rates of COVID-19 and deaths in nursing homes nationwide.
These studies demonstrated the need to understand facility-and contextual-level risk factors for COVID-19 within and across nursing homes, to thereby, reduce the likelihood of premature deaths in future pandemics.

Purpose of study
The purposes of this study were to utilize the COVID-19 nursing home surveillance data to (a) document the spatial patterns and diffusion of confirmed COVID-19 cases and deaths from 1-January to 26-July, 2020 and (b) to examine facility and contextual-level risk factors for COVID-19 deaths in nursing to explain these trends.To ensure an adequate sample (and power) to detect risk factors for COVID-19 deaths, this study investigated nursing home risks within the U.S. Health and Human Service (HHS) regions [17] at three-week intervals.The findings from this retrospective cross-sectional study were intended to inform HHS regional administrators and state health departments to prevent transmission and premature deaths in U.S. nursing homes in future pandemics.

Data
Nursing home COVID-19 data on confirmed cases (n = 173,452) and deaths (n = 46,173) between 1-January to 26-July, 2020 were obtained from the Nursing Home COVID-19 Public File Centers for Medicare and Medicaid Services (CMS)-Division of Nursing Homes/Quality, Safety, and Oversight Group/Center for Clinical Standards and Quality (Data.CMS.gov)[4].These data included reports by nursing homes to the CDC's National Healthcare Safety Network (NHSN) System COVID-19 Long Term Care Facility Module [18].Individual records were from each certified Medicare skilled nursing facility/Medicaid nursing and long-term care facility with data for each collection week.The authors did not have information that could identify individuals within nursing homes and thus the data used for this study were exempt from IRB review.
The dependent variable in subsequent statistical analyses was the count of COVID-19 deaths among residents (ResidentsWeeklyCOVID19Deaths) in nursing homes within HHS regions.Explanatory variables of interest included (a) the count of COVID-19 confirmed and suspected cases, staff diagnosed with COVID-19 and admissions of COVID-19 patients from hospitals into nursing homes to investigate levels of infectivity in nursing homes and the potential of COVID-19 transmission, (b) the count of nursing shortages to assess a potential barrier to care, and (c) the facility conditions, such as the availability of testing and the presence or absence of a ventilator dependent unit in the nursing home.
Also, the neighborhood location of nursing homes was used to assess potential vulnerability following studies that have shown the importance of understanding contextual-level risks and the potential of community level COVID-19 infectivity rates (1 = urbanized-area, 0 = all others; 1 = urban-cluster, 0 = all others; 1 = rural = 1, 0 = all others) with urbanized areas defined as 50,000 or more people; urban clusters defined as at least 2,500 and less than 50,000 people; and rural areas having less than 2,500 people.Social vulnerability was measured using the Center for Disease Control's Social Vulnerability Index [19,20] which included four continuous themes (1) low socio-economic status (including low education, employment and poverty), (2) household composition (including a high proportion of dependency, single parenting and disability), (3) high minority and non-English speaking residents, and (4) crowded housing with low access to automobile transportation.For each census tract, the percentile rank across all tracts for each of the four themes was used in this study.The nursing home point locations were spatially joined to the 2010 urban boundary [21] and the Social Vulnerability Index at the census tract level in ArcGIS v. 10.8 [22] to assign these contextual attributes to each nursing home.Finally, new variables were created including the bed-level of the nursing home using the number of beds listed, and if missing, the number of occupied beds was used (base = <50 beds, small = 50-99 beds, mid-sized = 100-249 beds, and large = 250+ beds [23].Veteran's Nursing Homes were also coded (1 = yes, 0 = no).Nursing homes within and across states were studied within HHS Regions (see Fig 1).

Spatial diffusion
The nursing home's geocoded point locations were used to spatially smooth the confirmed case counts using kernel density in ArcGIS v. 10.8 [22] with a default grid cell size and 1 square-mile bandwidth parameters.Bivariate spatial autocorrelation was used to detect nursing homes with clusters of High-case counts + High-deaths, Low-case counts + High-deaths and Low-case counts + Low-deaths.These case-death clusters were overlayed onto the spatially smoothed case data to visual the spatial patterns of COVID-19 diffusion across nursing homes.These spatial analyses were conducted for the time period 1-January to 24-May, 2020 and weekly thereafter through 26-July, 2020.

Statistical analysis
An examination of the weekly counts of COVID-19 deaths revealed that they were highly right skewed with a high percentage of records listed as zeros.Zeros in the data set were thus viewed as structural -i.e., the zeros were due to a sub-population of residents who were not at risk of COVID-19 transmission; or sample of residents who were at risk of COVID-19 but still produced a zero outcome perhaps due to testing variability [24]-e.g., antibody testing too soon post-exposure.In this study, zeros were modeled as a latent variable as outlined below.Zero-inflated negative binomial (ZINB) models were estimated using 'proc countreg' in SAS v. 9.3.[25] to account for overdispersion (variance substantially larger than mean) and excess of zeros [26].If significant dispersion was not observed, a zero-inflated Poisson (ZIP) model was estimated.These models had two components by which to estimate each of the two distributions: negative binomial model to predict non-zero or counts of COVID-19 deaths and a logistic model to predict the excessive zeros.The count of occupied beds in nursing homes was used as an offset in the model.Bivariate Pearson correlation for the continuous counts of data (proc corr) was used to assess whether a statistically significant linear relationship existed between two continuous variables and the direction and strength of the relationship.These findings informed the parameterization of the ZINB and ZIP models; in particular, those coefficients with large standard errors.The maximum log-likelihood was used to fit the model.The smallest Akaike Information Criterion (AIC) and Schwartz's Bayesian information criterion (SBC) were used to identify the best fit model.
These data were grouped into four distinct time periods for statistical analysis at the HHS regional level 1-January to 24-May, 2020 and thereafter, 3 week intervals 25-May to 14-June, 2020, 15-June to 5-July, 2020 and 6-July to 26-July, 2020.These groupings (excluding the first time period that included back-logged records) were derived from knowledge of the disease continuum -i.e., 2-14 days from exposure to symptoms, plus one week to potential death.The benefits of using a 21-day window rather than a weekly time-series were higher case counts and power for statistical analyses, and it accounted for potential variation in reporting lags.

Results
The national case-fatality rate (COVID-19 deaths per 100, confirmed COVID-19 cases) from 1-January to 26-July, 2020 was 26.6 with rates in Regions 1, 2, 3, 5 and 8 higher than the national average.shows the summary data on nursing home case-fatality rates by HHS Region for four time periods.Those HHS Regions that were higher than the national averages for 1-January to 24-May, 2020 (28.2) were Regions 1, 2, 8 and 10, 25-May to 14-June, 2020 (30.8) were Regions 1, 3, 5, 7 and 10, June 15 to 5-July, 2020 (24.1) were Regions 1, 2, 3, and 5 and 6-July to 26-July, 2020 (19.4) were Regions 1, 2, 3, 5 and 7.These findings demonstrate that nursing home case-fatality rates fluctuated with epicenters in the Northeast and Midwest early in the pandemic.The casefatality rates for states within these HHS Regions are presented in S1 Table.

1-January to 24-March, 2020
Between 1-January to 24-May, 2020 nursing homes with High-case counts + High-deaths and Low-case counts + High-deaths were clustered in states along the eastern seaboard, in urban areas of the Midwest and dispersed in some cities in the South.Clusters of nursing homes with High-case counts + Low-deaths were largely outside of cities and in rural areas (see Fig 3).
Table 1 shows factors that underly these trends from 1-January to 24-May, 2020.Specifically, confirmed COVID-19 infection among residents and staff diagnosed with COVID-19 were significantly associated with COVID-19 deaths in nursing homes in HHS Regions 1, 2, 3, 5 and 9. Admission of COVID-19 patients from hospitals into nursing homes was also a significant risk factor in HHS Regions 2 and 5.There were significantly higher deaths in mid-sized nursing homes (100-249 beds) compared to small nursing homes (<50 beds) in HHS Regions 2 and 3-a phenomena not seen in large nursing homes (250+ beds) except in Region 5. Neighborhood social vulnerability was not a significant risk factor for COVID-19 deaths in nursing homes from 1-January to 24-May, 2020 of the pandemic.

25-May to 14-June, 2020
From 25-May to 31-May, 2020 clusters of High-case counts + High-deaths and Low-case counts + High-deaths expanded in states along the eastern seaboard, in urban areas in the Midwest and more cities in the South.There were however, an increase in clusters of Highcase counts + Low-deaths also emerging in the Northeast (see Fig 4A).These patterns persisted the following week (1-June to 7-June, 2020) and the week thereafter (8-June to 14-June, 2020) (see Fig 4B and 4C) where greater expansive diffusion of COVID-10 cumulative cases per square mile within states was also observed in the Northeast and Midwest.
Table 2 shows factors that underly these trends from 25-May to 14-June, 2020.While confirmed COVID-19 cases was a significant risk factor for COVID-19 deaths in HHS Regions 2, 4 and 5 staff confirmed with COVID-19 was a significant risk factor in HHS Regions 1, 2, 3, 4 and 5 demonstrating diffusion of staff confirmed infections into the South.The admission of COVID-19 patients from hospitals to nursing homes continued to be a risk factor in Region 5.During this time-period, nursing shortages also emerged as a significant risk factor in Region 5.In HHS Region 2 the risk of death in mid-(100-249 beds) and large-(>250+ beds) sized nursing homes was not significantly greater than in small (<50 beds) nursing homes which was a change from the prior reporting period.During this time-period neighborhood social vulnerability emerged as a risk factor with nursing homes in Region 3 experiencing higher deaths in neighborhoods with high dependency, single parent households and households with disabilities.In Region 4 COVID-19 deaths were significantly high in nursing homes in high minority and low-English speaking neighborhoods.square mile diffused across states in the Northeast and Midwest however, there was also an increase in Low-case counts + High-deaths in areas outside of the original epicenters.A similar phenomenon was observed in the South (see Fig 5A).These north-south trends persisted through the next week (22-June to 28-June, 2020) and the week thereafter (29-June to 5-July, 2020) (see Fig 5B and 5C).Table 3 shows factors that underly these trends from 15-June to 5-July, 2020.Confirmed COVID-19 cases persisted as a significant risk factor for COVID-19 deaths in HHS Regions 2, 4 and 5 with staff confirmed COVID-19 a significant risk factor in HHS Regions 3, 5 and 6.Furthermore, the admission of COVID-19 patients from hospitals to nursing homes emerged in HHS Regions 4 and 6-these findings demonstrating the diffusion of known risk factors for COVID-19 deaths in nursing homes into the South.Nursing shortages also emerged as a significant risk factor for COVID-19 deaths in HHS Region 4 while persisting in HHS Region 5.In HHS Regions 2 and 6 the risk of death in mid-(100-249 beds) and large-(> 250+ beds) sized nursing homes was not significantly greater than in small (< 50 beds) nursing homesthis phenomenon persisting in HHS Region 2 but new in HHS Region 6.Finally, risk factors relating to social vulnerability persisted in Region 3 with higher COVID-19 deaths continuing in neighborhoods with high dependency, single parent households and households with disabilities.In Region 5 COVID-19 deaths were significantly elevated in high minority and low-English speaking neighborhoods.In Region 4 COVID-19 deaths in nursing homes persisted in neighborhoods with high minority and low-English speaking populations.In Region 6 COVID-19 deaths in nursing homes were significantly high in urban areas.Table 4 shows factors that underly these trends from 6-July to 26-July, 2020.Confirmed COVID-19 cases and staff confirmed with COVID-19 were significant risk factors for COVID-19 deaths in HHS Regions 4 and 6 of the South.Admissions of COVID-19 patients into nursing homes emerged as a risk factor in HHS Region 3 and continued to be a risk factor   Northeast (Regions 2 and 3) COVID-19 deaths in nursing homes were significantly high in urban areas and neighborhoods with a high dependency, single-parent households and households with disabilities.In HHS Region 3 high COVID-19 deaths were also observed in nursing homes in high minority and low English-speaking residents-a new phenomenon for HHS Region 3.

Discussion
This study analyzed the CMS data on COVID-19 cases and deaths in nursing homes from 1-January to 26-July, 2020.Weekly maps were created to demonstrate regional COVID-19 spatial diffusion.Risk factors for COVID-19 death in nursing homes within four time periods revealed three universal findings.The first universal risk factor for COVID-19 nursing home deaths were staff confirmed COVID-19 occurring across HHS Regions 1, 2, 3, 5 and 9 (1-January to 24-May, 2020), HHS Regions 1, 2, 3, 4 and 5 (25-July to 14-June, 2020), HHS Regions 3, 5 and 6 (15-June to 5-July, 2020) and HHS Regions 4 and 6 (6-July to 26-July, 2020).On May 29, 2020 "Considerations for Preventing the Spread of COVID-19 in Assisted Living Facilities" was published by the CDC [27].Testing protocols during this time period were based on staff's signs and symptoms  [7] however, found that between 18-May and 19-July, 2020 there were severe shortages of PPE in addition to staff shortages, which may in part be explained by staff diagnosed with COVID-19.In previous literature nursing shortages were not observed in mid-April (April 22-29, 2020) in 23 states [14] findings consistent with this study from 1-January to 14-June, 2020.This study observed nursing shortages as early as 25-May, 2020 in HHS Region 5 and subsequently through 5-July, 2020 in HHS Regions 2, 4 and 5 likely due to the high infection rates among staff and/or patients at this time.These findings were consistent with Sugg et al. ( 2021) [8] who found significant shortages of staff working in nursing homes with COVID-19 patients through 30-June, 2020 with this study showing moderately significant shortages among Registered Nursing staff and highly significant shortages among aide staff, findings that are informative in further understanding the type of nursing staff shortages.In this study, nursing shortages were also observed from 6-July to 26-July, 2020 in HHS Regions 4 and 6 about three-weeks after the observed north-south diffusion.
According to Hege et al (2022) [9] the nursing shortages continued to be a significant risk factor for COVID-19 infection rates beyond 26-July, 2020 of the pandemic using CMS COVID-19 data through 31-January, 2021.
A second universal finding was that the transfer of COVID-19 patients from hospitals into nursing homes had a significant impact on nursing home COVID-19 related deaths in HHS Regions 2 and 5 (1-January to 24-May, 2020) and HHS Region 5 (25-May to 14-June, 2020).From 1-January to 24-May, 2020) the case-fatality rates in HHS Region 2 were highest in New York (44.4) and HHS Region 5 in Michigan (36.4).From 25-May to 14-June, 2020 the casefatality rates in HHS Region 2 were highest in New Jersey (31.0) and New York (22.3) and HHS Region 5 Michigan (63.6), Minnesota (46.8) and Illinois (38.4).Following the northsouth diffusion of COVID-19 the practice of transferring hospitalized patients with COVID-19 into nursing homes for additional care and its effect on receiving nursing home deaths was significant in HHS Regions 4 and 6 (15-June to 5-July, 2020) and HHS Regions 3 and 4 (6-July to 26-July, 2020).The subsequent case fatality-rates in states in the South however, were substantially lower than in the Northeast.These noteworthy findings suggests that although the practice of transferring COVID-19 patients from hospitals into nursing homes continued over time in the South, other practices must have been in place to avoid high case-fatality rates.
Finally, nursing home location and type of neighborhood social vulnerability did not emerge as a significant risk factor for COVID-19 deaths until after the beginning of the pandemic, indicative of early undetected COVID-19 transmission across nursing homes countrywide.Thereafter, nursing homes in urban areas were the first to show high case-fatality rates for COVID-19 among residents.The two most common types of social vulnerability were neighborhoods with high-minority and non-English speaking residents and neighborhoods with a high proportion of dependents, single-family households and households with disabilities.These findings are similar to studies by Sugg et al. [8] who found important county-level characteristics significant for nursing home COVID-19 infection, including higher unemployment, higher average gross rent, higher percentage of occupied houses with no vehicle available and higher percent African American residents.Traverse et al. [15] also found countylevel resources, rurality and counties with a high proportion of Black residents in part explained high rates of COVID-19 and deaths in nursing homes nationwide.Chatterjee et al. [14], Hege et al. [9] and Sugg et al. [8] found an increase in COVID-19 reporting in nursing homes in counties with high levels of COVID-19 infection, thus future research could focus on understanding the composition of populations in counties with high infection rates to assess minority population groups and households with disabilities to assess their increased vulnerability for COVID-19 transmission.

Limitations
The limitations to this study include, first, the Nursing Home COVID-19 Public File includes data reported by all certified Medicare skilled nursing homes and Medicaid nursing facilities to the National Healthcare Safety Network Long Term Care Facility Module [18].These data therefore do not include CMS assisted living facilities or intermediate care facilities for individuals with intellectual disabilities or private nursing homes that do not participate in CMS.Second, the nursing home surveillance data may include delayed reports and thus, the findings may change slightly as the module's data are updated.This study utilized the original data for surveillance from 1-January to 26-July, 2020 provided by the CDC-CMS [4] to get a snapshot of the beginning of the COVID-19 epidemic in nursing homes in the U.S., the same data used to make early policy and programmatic healthcare decisions.Future research could reanalyze these data using current surveillance data on nursing homes to detect any potential differences in reporting and results.Third, those residents confirmed with COVID-19 who were transferred to hospitals, may have died in the hospital and therefore, were not counted in these data resulting in an underestimation of COVID-19 infection/transmission in nursing homes.This limitation may further explain the variation in COVID-19 cases and deaths across time-periods in the highly populated HHS Region 2. Fourth, the transmission of COVID-19 between nursing home residents and staff and vice versa and subsequent staff shortages are not known.Future research may be able to disentangle these questions using testing results in nursing homes with high-reporting data over time.Finally, future research could compare these results with state-specific nursing home COVID-19 data to learn more about the decentralization of COVID-19 healthcare policies and practices, in relation to federal guidelines, including facility-level quality assurance ratings.

Conclusions
Studying COVID-19 in nursing homes from 1-January to 26-July, 2020 provided an opportunity to observe early regional trends in diffusion and risk factors for premature COVID-19 related deaths.This study found an early north-south diffusion of COVID-19 in nursing homes suggesting that improvements in communication between Health and Human Service administrators and the timely transmission of "Lessons Learned" to state health departments will further help to prevent premature nursing home deaths in future pandemics.

Fig 1 .
Fig 1. Reference map of states within U.S. Department of Health and Human Services (HHS) Regions.Source: [17].https://doi.org/10.1371/journal.pone.0308339.g001 Fig 2 (*if accepted, production will need this reference to link the reader to the figure)

From 15 -
June to 21-June, 2020 clusters of High-case counts + High-deaths and Low-case counts + High-deaths in nursing homes began to diminish along the eastern seaboard with clusters of High-case counts + Low-deaths more pronounced.As the COVID-19 cases per

From 6 -
July to 12-July, 2020 clusters of High-case counts + Low-deaths were prominent in the Northeast demonstrating a sustained transition from clusters of High-case counts + Highdeaths and Low-case counts + High deaths.Clusters of High-or Low-case counts + Highdeaths were largely observed outside of urban areas in the Northeast and Midwest but now dispersed throughout the South and Southwest (see Fig 6A) the following week (13-July to 19-July, 2020) and the week thereafter (20-July to 26-July, 2020) (see Fig 6B and 6C).By 26-July, 2020 the epicenters of nursing home COVID-19 deaths was in the South with emergence into the Southwest.

Table 3 . Estimating nursing home COVID-19 deaths by U.S. Health and Human Service (HHS) Regions, 15-June to 5-July 2020: Zero-inflated negative binomial models 1 .
1Intercept Negative Binomial Model to estimate COVID-19 death counts offset by facility's occupancy; Inf_Intercept logistic models to estimate zeros (SAS v.9.3) controlling for testing and presence vs. absence of a ventilatory unit. 2 SV = Social Vulnerability.